**Dineshkumar Bhaskaran**

☎: - +1 778 893 8274 **✉**: - dineshkumarb@gmail.com,

**LinkedIn**: <https://www.linkedin.com/in/dinesh-kumar-a88a2a7/> **Address**: 10104 159A ST, V4N 2P8.

**Visa status**: Canada Open Work permit; valid till 12-Oct-2025.

**Professional Experience Summary**

High Performance Computing and Systems Professional with 19+ years of collective experience in AI/ML, HPC applications, Distributed Storage, and Linux Kernel development.

* Strong aptitude for algorithm/application design and implementation. Experience in Optimizing and parallelizing applications in AI/ML, Storage and Image processing domains using HPC languages like OpenCL, HIP, and CUDA on various platforms like NVIDIA, AMD GPGPUs, and Xilinx FPGAs.
* Experience working on all stages of product development like SoC validation, Board bring-up, and porting/developing embedded driver development. Have a strong background in Linux Driver development, Linux Kernel Programming, System integration, and troubleshooting skill.
* Experience developing, enhancing, and maintaining an in-house Linux-based operating system and GCC-based toolchain for ARM/X86 32/64-bit platforms. Exposure to Clang compiler development.
* Experience with middleware and application stack development and integration. Exposure to IoT-based application development and IoT-based stacks like Alljoyn and IoTivity. Proficient in designing and developing Test and benchmark automation frameworks using Google Test/Google benchmark/CPP unit/Python/Bash scripts. Experience with building automation for GCC using Linaro ABE.
* Experience with SCSI, Fibre Channel storage protocols, Target mode drivers and FC switch-based virtualization applications and products.
* Have 6+ years of experience as a Manager/Lead and 3+ years of international experience.

**Technical Skill Summary**

|  |  |
| --- | --- |
| AI/ML framework and stacks | Exposure to PyTorch, MLPerf AI Benchmarks, Machine learning compilers like Apache TVM, AMD’s MIGraphX. |
| Programming languages | C, C++, HIP, CUDA, OpenCL, Exposure to Python, X86 32/64 and ARM 32 assembly languages, PTX ISA, HLSL. |
| Development Tools | ROCm Toolchain, GNU development tools, Git, Rational Clear case, SVN, Synopsys Virtualization Platform. |
| Protocol and Protocol Stacks | Fiber Channel, Understanding of SCSI protocol and the SCSI protocol stack in Linux. Exposure to USB, OpenAirInterface 4G stack. |
| Storage Devices | Brocade switch series. Primarily Brocades 48K switch and Pizza boxes. Flexline Array Controller, Sony AIT SCSI, IDE tape drives SDX series, L180 Tape Library |

**Experience**

***AMD India (Senior Member of Technical Staff) Aug 2019 - till date***

**ROCm™: Machine learning.**

The ROCm (<https://github.com/RadeonOpenCompute/ROCm>) open software platform is constantly evolving to meet the needs of the machine learning (ML) and Artificial Intelligence (AI) community. The latest versions of ROCm include comprehensive optimization of AI and HPC workloads. These include highly optimized frameworks, libraries, and other components for various HPC applications and AI models, including large language models for even the latest AMD GPU and CPU hardware generation. In this role, as part of a cross-geographical team, I was responsible for engaging and supporting various customer applications along with the following activities:

**Rapids development:**

* Adapting Rapids (parallel implementation for pyData libraries for data science application) Data frame processing library (cuDF) for use AMD platform. In this ongoing activity, my responsibilities included initial estimation of the whole project and later ownership of porting, testing, and benchmarking multiple rapids sub-projects like rapids-cmake and RMM.

**MLPerf Inferencing:**

* Lead efforts across geographies to enable MLPerf-based inferencing infrastructure on AMD Instinct GPUs.
* Developed python-based reference MLPerf code to support ResNet50, Bert and YoloV4 models for leading ROCm versions for multiple backends like Pytorch, Tensorflow, Tensor virtual machine (TVM) and AMD’s MIGraphX. Added support for multi-gpu and larger batch size as part of optimization.
* Enhance TVM for supporting the latest ROCm versions and newer AMD instinct GPUs. Fix MIOpen multi-gpu support for the latest ROCm version in the TVM stack.
* Design and create a Lightweight Inference server in C++ with TVM backend.

**LLVM-Clang Compiler:**

* Maintainer of the ROCm Compiler Support (<https://github.com/RadeonOpenCompute/ROCm-CompilerSupport>) from Aug-2019 to Sept-2021. Apart from regular bug fixes, I added support for Windows OS, bundled code-object, performance enhancements, and internal profiling infrastructure.
* Was part of a compiler infra support group involved with regular maintenance (fixing merge failures) and various LLVM compiler issues, particularly performance issues arising during system testing.
* Involved with enhancing AMD clang driver by adding support for
  + Multithreaded Command parsing library.
  + In-memory compilation by adding write operations to the In-memory files system in LLVM.

***Aricent India (Principal Engineer) Oct 2017-Jul, 2019***

**Accelerated Storage IO library.**

Distributed storage functions like erasure codes, encryption, and de-duplications are compute-intensive operations. Aricent has developed an accelerated storage I/O library that utilizes GPUs to improve encoding and decoding processes in various erasure-code algorithms for CEPH. In addition to enhancing the I/O performance for a distributed storage application using erasure codes like CEPH, Aricent’s EC-offload-engine (ECoE) library frees up the underlying compute for other storage applications. My responsibilities included:

* Ideation, budgeting, procurement, recruitment, and management.
* Identification of the latest erasure code algorithms for the ECoE library. Collaborated with the Indian Institute of Science, Bangalore, to use their “minimum storage regenerating erasure code” for the Aricent solution.
* Preparing White papers, Blogs, Demos, and Client interactions. Presented the work at SDC India and SDC Santa Clara 2018. More details of the work at https://www.youtube.com/watch?v=4QFb2Zvr6yc.

**Open Hardware**

Open Hardware envisions reducing the total cost of ownership for telecom operators by delivering a reconfigurable, modular edge platform for high-performing software-defined radios (SDRs) using open-source solutions for wide-scale adoption. Open HW platforms consist of CPU banks, FPGAs, DSPs, and GPU to provide generic compute resources for multiple technologies like 5G, DOCSIS, AR/VR, etc. My responsibilities included as a tech lead for

* Containerization of software-based open source 4G stack called OpenAirInterface for OpenHW. Further involved in offloading FFT algorithm in OpenAirInterface to NVIDIA GPU and Xilinx FPGA VCU1525 and integration with K8S for NVIDIA GPU and Intel platforms.
* Customer collaboration, preparing business collaterals and demos (enabling voice calls through a private network).

***Canon Inc, Japan, and Canon India (Principal Engineer) Mar 2010 - Oct 2017***

**Canon Parallel Image processing library,**

This project aimed to prepare an advanced parallel library on Linux for medical image processing algorithms with support for NVIDIA, AMD, and X86-based platforms to augment Future Canon Medical equipment. The library was developed in OpenCL and highly optimized to perform faster than open-source solutions like OpenCV, ITK, and Canon internal solutions. My responsibilities included:

* Implement and enhance a complete parallelized Image registration framework for both intensities-based and point-based image registration-related algorithms. Contributed to parallelization of algorithms like ICP, Powell optimizer, Regular gradient descent optimizer, LevenbergMarquardt optimizer, Mutual Information, Normalized mutual information, cross-correlation, Ratio Image Uniformity, Sum of square differences, Euclidean distance metric and Resampler.
* Parallelizing image processing algorithms like Gaussian Blur, DFT, and norm, algorithms for determining Image statistics, performing Image normalization, Image Interpolation, and Computation of 2D histograms.
* Performance analysis and comparison of OpenCV CUDA and OpenCL OCL implementation of various image processing algorithms.
* Development of CPPUnit-based test framework for Canon India in the early development phase. It could automate CSV-based testing, variation of performance parameters, test results and performance chart generations, template generations etc.
* Porting DR motion software’s image noise reduction (NR) module and image enhancement (EN) module to OpenCL. This task also required reducing errors due to porting by finely adjusting floating point computation on OpenCL to match with HLSL and optimizing OpenCL implementation.
* Porting a portion of DR software to CELL broadband engines using proprietary compilers (FOXC) for performance analysis and study.

**Canon Embedded Linux Platform.**

This project involved porting, enhancing, and maintaining Linux based operating system for Canon embedded products like Surveillance cameras, projectors, and network scanners. The project scope ranged from porting Linux kernels (3.x, 2.x based) with support to various Industry known SoCs, supporting and fixing issues with GCC-based toolchains and investigating new Linux-based technologies for Canon products. My responsibilities as a Technical Lead/as well as Manager for an eight people team included:

* People management, project planning, execution, management, hiring, contracting, mentoring and training.
* Porting of multiple Linux kernel versions (3.10.x, 2.6.x) for Canon proprietary boards (used for network surveillance products), ZC-702/ZC-706, TI-BeagleBone black, AM437x and Synopsis Virtual platform etc.
* Customization to Linux platform like enabling/porting support for OP-TEE (TrustZone for ARM) on ZC-706, Porting Alljoyn (IoT stacks), and IoT application development for Canon Products.
* Building, testing, enhancing, and maintaining the GCC 6.0 based Cross compiler toolchain with Multilib support for ARM v7/v8 32/64 bit and X86 32/64bit. Responsible for creating a random C program generator for compliance testing of GCC-based toolchain.
* Streamlining and automating testing processes for Linux kernel and Real-time kernel testing.
* Automation of Linux kernel vulnerabilities investigation. Involved in framing an organization-wide policy to establish Linux kernel contributions by fixing issues in Linux stable kernel.

***Brocade communications, India (Software Engineer) Jan 2008 - Mar 2010***

**SAS (Storage area services)**

Brocade Storage Application Services (SAS) on Brocade 7600 Fabric Application Platform Switch provides fabric-based services by integrating high-performance storage applications. SAS delivers intelligence in SANs to perform fabric-based storage services, including online data migration, storage virtualization, and continuous data replication and protection. SAS is successfully deployed in storage with Brocade and OEM partner storage solutions like DMM, EMC Recover-point and Invista. My responsibilities included:

* SAS enhancements and related development features. Worked through SAS v2.x to v3.x versions. Handling SAS-related customer issues and maintenance.
* Ownership of virtual initiator module in SAS. Involved in every phase of porting, development, and enhancement of SAS (primarily Virtual Initiator module)
* Multiple deputations in Brocade-US to coordinate SAS activities between on-site and India teams.

***Tata Elxsi (Specialist Engineer) Sept 2003-Dec 2007***

**FCTMD (Fibre channel Target mode driver),**

The project involved developing a Target Mode driver for LSI logic FC HBAs based on LSI-Logic Fusion message passing technology to act as a virtualized storage box. My responsibilities included:

* Development of LSI Logic Fibre channel driver to work in standalone mode with real-world devices and with Software RAID Controller system when required and a character driver interface with user interface for configuration of the driver.
* Development a proficient kernel memory leak detector that will trace kernel memory allocation interfaces like kmalloc, vmalloc, and alloc\_pages for a kernel module.

**Virtual Storage Management**

The SUN microsystem VSM product line was involved in developing virtual storage management solutions for MVS (Mainframe) clients. The fibre channel tapes (3x90 series) were virtualized for infinite storage and provided high availability with SUN proprietary tape drives and libraries. My responsibilities included the implementation of the following:

* IBM 3490, 3590 Tape drive emulation (TDE) units in the VSM product for MVS clients. This involved building basic infrastructure for command parsing and execution. Implemented 3490 commands like READFWD, WRITE, BSF, FSF, REWIND, WTM, and NOP.
* Linux character driver IOCTL interface for testing by injecting tape commands to TDE.

***Select writings***

1. Accelerated Erasure Coding: The New Frontiers of Software-Defined Storage - 2018

* Presented at SNIA SDC Santa Clara: https://www.snia.org/events/storage-developer/presentations18.
* Presented at SNIA SDC india: https://www.snia.org/educational-library/accelerated-erasure-coding-new-frontier-software-defined-storage-2018
* Business white paper: https://www.aricent.com/whitepaper/preview/17451
* Blog: <https://www.datacenterdynamics.com/opinions/why-erasure-coding-is-the-future-of-data-resiliency/> https://www.networkcomputing.com/storage/how-erasure-coding-evolving/155400422

1. OpenHW - A new era in mobile edge computing. Business white paper - 2019
2. A novel mathematical formulation of GPU based parallel derivative computation in similarity metrics for Image Registration - 2015
3. Userspace I/O driver performance benchmarking - 2010
4. Writing a Network Device driver. Published in Linux Gazette online magazine – 2003 <http://www.tldp.org/LDP/LG/issue93/bhaskaran.html>

***Education***

1. ***Deep Learning Theory and Practice***, IISc Bangalore.
2. ***M.S Software systems 2006-2009***, BITS Pilani.
3. ***Bachelor of Technology, Computer Engineering 1999-2003***, Govt. Engineering College Trichur, Kerala.